Overview

Dataset statistics

Number of variables11
Number of observations5703
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory534.7 KiB
Average record size in memory96.0 B

Variable types

Numeric11

Alerts

gross_revenue is highly correlated with qtd_purchases and 1 other fieldsHigh correlation
qtd_purchases is highly correlated with gross_revenue and 1 other fieldsHigh correlation
qtd_products is highly correlated with gross_revenue and 3 other fieldsHigh correlation
avg_products is highly correlated with qtd_products and 1 other fieldsHigh correlation
avg_unique_products is highly correlated with avg_productsHigh correlation
recency_days is highly correlated with customer_idHigh correlation
purchase_frequency is highly correlated with qtd_purchasesHigh correlation
qtd_cancelled_products is highly correlated with qtd_products and 2 other fieldsHigh correlation
cancelled_over_purchased is highly correlated with qtd_cancelled_productsHigh correlation
customer_id is highly correlated with recency_daysHigh correlation
gross_revenue is highly skewed (γ1 = 23.72814196) Skewed
mean_ticket is highly skewed (γ1 = 46.60155412) Skewed
qtd_products is highly skewed (γ1 = 23.07183204) Skewed
avg_products is highly skewed (γ1 = 48.56980066) Skewed
qtd_cancelled_products is highly skewed (γ1 = 52.07018364) Skewed
customer_id has unique values Unique
qtd_cancelled_products has 4200 (73.6%) zeros Zeros
cancelled_over_purchased has 4200 (73.6%) zeros Zeros

Reproduction

Analysis started2022-11-23 22:18:17.651921
Analysis finished2022-11-23 22:18:43.851409
Duration26.2 seconds
Software versionpandas-profiling v3.4.0
Download configurationconfig.json

Variables

customer_id
Real number (ℝ≥0)

HIGH CORRELATION
UNIQUE

Distinct5703
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean16602.87585
Minimum12346
Maximum22709
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size89.1 KiB
2022-11-23T19:18:44.007356image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum12346
5-th percentile12697.3
Q114287.5
median16229
Q318214
95-th percentile21746.6
Maximum22709
Range10363
Interquartile range (IQR)3926.5

Descriptive statistics

Standard deviation2811.617168
Coefficient of variation (CV)0.1693451901
Kurtosis-0.8238803827
Mean16602.87585
Median Absolute Deviation (MAD)1964
Skewness0.4409657161
Sum94686201
Variance7905191.098
MonotonicityNot monotonic
2022-11-23T19:18:44.156276image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
178501
 
< 0.1%
143831
 
< 0.1%
210911
 
< 0.1%
171231
 
< 0.1%
178911
 
< 0.1%
164981
 
< 0.1%
137451
 
< 0.1%
155841
 
< 0.1%
210891
 
< 0.1%
210881
 
< 0.1%
Other values (5693)5693
99.8%
ValueCountFrequency (%)
123461
< 0.1%
123471
< 0.1%
123481
< 0.1%
123491
< 0.1%
123501
< 0.1%
123521
< 0.1%
123531
< 0.1%
123541
< 0.1%
123551
< 0.1%
123561
< 0.1%
ValueCountFrequency (%)
227091
< 0.1%
227081
< 0.1%
227071
< 0.1%
227061
< 0.1%
227051
< 0.1%
227041
< 0.1%
227001
< 0.1%
226991
< 0.1%
226961
< 0.1%
226951
< 0.1%

gross_revenue
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED

Distinct5458
Distinct (%)95.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1719.97418
Minimum0
Maximum278778.02
Zeros7
Zeros (%)0.1%
Negative0
Negative (%)0.0%
Memory size89.1 KiB
2022-11-23T19:18:44.434387image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile12.596
Q1230.52
median603.42
Q31544.88
95-th percentile5190.786
Maximum278778.02
Range278778.02
Interquartile range (IQR)1314.36

Descriptive statistics

Standard deviation7360.945437
Coefficient of variation (CV)4.279683685
Kurtosis740.6201671
Mean1719.97418
Median Absolute Deviation (MAD)473.97
Skewness23.72814196
Sum9809012.75
Variance54183517.73
MonotonicityNot monotonic
2022-11-23T19:18:44.592113image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
7.959
 
0.2%
1.258
 
0.1%
2.958
 
0.1%
4.958
 
0.1%
12.757
 
0.1%
1.657
 
0.1%
07
 
0.1%
3.757
 
0.1%
5.956
 
0.1%
7.56
 
0.1%
Other values (5448)5630
98.7%
ValueCountFrequency (%)
07
0.1%
1.776356839 × 10-151
 
< 0.1%
3.552713679 × 10-152
 
< 0.1%
1.065814104 × 10-141
 
< 0.1%
5.684341886 × 10-141
 
< 0.1%
0.421
 
< 0.1%
0.651
 
< 0.1%
0.791
 
< 0.1%
0.844
0.1%
0.853
0.1%
ValueCountFrequency (%)
278778.021
< 0.1%
259657.31
< 0.1%
189735.531
< 0.1%
133007.131
< 0.1%
123638.181
< 0.1%
114505.321
< 0.1%
88138.21
< 0.1%
65920.121
< 0.1%
62924.11
< 0.1%
59419.341
< 0.1%

mean_ticket
Real number (ℝ≥0)

SKEWED

Distinct5508
Distinct (%)96.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean28.22121083
Minimum0
Maximum9904.875
Zeros7
Zeros (%)0.1%
Negative0
Negative (%)0.0%
Memory size89.1 KiB
2022-11-23T19:18:44.745008image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile3.3502
Q17.821939589
median15.22257143
Q321.20793145
95-th percentile71.26302198
Maximum9904.875
Range9904.875
Interquartile range (IQR)13.38599186

Descriptive statistics

Standard deviation158.8876329
Coefficient of variation (CV)5.630078519
Kurtosis2707.395005
Mean28.22121083
Median Absolute Deviation (MAD)7.041078891
Skewness46.60155412
Sum160945.5654
Variance25245.27989
MonotonicityNot monotonic
2022-11-23T19:18:45.242880image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3.7511
 
0.2%
4.9510
 
0.2%
1.259
 
0.2%
2.959
 
0.2%
7.958
 
0.1%
1.657
 
0.1%
07
 
0.1%
12.757
 
0.1%
8.257
 
0.1%
5.956
 
0.1%
Other values (5498)5622
98.6%
ValueCountFrequency (%)
07
0.1%
1.758769148 × 10-171
 
< 0.1%
8.07434927 × 10-171
 
< 0.1%
2.537652628 × 10-161
 
< 0.1%
4.844609562 × 10-161
 
< 0.1%
1.894780629 × 10-151
 
< 0.1%
0.361
 
< 0.1%
0.423
0.1%
0.5351
 
< 0.1%
0.651
 
< 0.1%
ValueCountFrequency (%)
9904.8751
< 0.1%
38611
< 0.1%
30961
< 0.1%
1687.21
< 0.1%
1435.7266671
< 0.1%
1377.0777781
< 0.1%
1001.21
< 0.1%
931.51
< 0.1%
872.131
< 0.1%
743.81
< 0.1%

qtd_purchases
Real number (ℝ≥0)

HIGH CORRELATION

Distinct56
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.46887603
Minimum1
Maximum206
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size89.1 KiB
2022-11-23T19:18:45.501475image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q34
95-th percentile11
Maximum206
Range205
Interquartile range (IQR)3

Descriptive statistics

Standard deviation6.808862958
Coefficient of variation (CV)1.962844131
Kurtosis302.4748716
Mean3.46887603
Median Absolute Deviation (MAD)0
Skewness13.20103788
Sum19783
Variance46.36061478
MonotonicityNot monotonic
2022-11-23T19:18:45.831427image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
12874
50.4%
2829
 
14.5%
3504
 
8.8%
4394
 
6.9%
5237
 
4.2%
6173
 
3.0%
7138
 
2.4%
898
 
1.7%
969
 
1.2%
1055
 
1.0%
Other values (46)332
 
5.8%
ValueCountFrequency (%)
12874
50.4%
2829
 
14.5%
3504
 
8.8%
4394
 
6.9%
5237
 
4.2%
6173
 
3.0%
7138
 
2.4%
898
 
1.7%
969
 
1.2%
1055
 
1.0%
ValueCountFrequency (%)
2061
< 0.1%
1991
< 0.1%
1241
< 0.1%
971
< 0.1%
912
< 0.1%
861
< 0.1%
721
< 0.1%
622
< 0.1%
601
< 0.1%
571
< 0.1%

qtd_products
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED

Distinct1840
Distinct (%)32.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean977.833421
Minimum1
Maximum196844
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size89.1 KiB
2022-11-23T19:18:46.059697image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile4.1
Q1106
median318
Q3804
95-th percentile2941.4
Maximum196844
Range196843
Interquartile range (IQR)698

Descriptive statistics

Standard deviation4425.980472
Coefficient of variation (CV)4.526313354
Kurtosis786.4421989
Mean977.833421
Median Absolute Deviation (MAD)253
Skewness23.07183204
Sum5576584
Variance19589303.14
MonotonicityNot monotonic
2022-11-23T19:18:46.283834image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1113
 
2.0%
273
 
1.3%
351
 
0.9%
449
 
0.9%
535
 
0.6%
629
 
0.5%
1225
 
0.4%
8822
 
0.4%
7221
 
0.4%
720
 
0.4%
Other values (1830)5265
92.3%
ValueCountFrequency (%)
1113
2.0%
273
1.3%
351
0.9%
449
0.9%
535
 
0.6%
629
 
0.5%
720
 
0.4%
818
 
0.3%
97
 
0.1%
1017
 
0.3%
ValueCountFrequency (%)
1968441
< 0.1%
809971
< 0.1%
802631
< 0.1%
773731
< 0.1%
742151
< 0.1%
699931
< 0.1%
645491
< 0.1%
641241
< 0.1%
633121
< 0.1%
583431
< 0.1%

avg_products
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED

Distinct2370
Distinct (%)41.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean268.240499
Minimum1
Maximum74215
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size89.1 KiB
2022-11-23T19:18:46.526765image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile4
Q175
median152
Q3290.7083333
95-th percentile733.9375
Maximum74215
Range74214
Interquartile range (IQR)215.7083333

Descriptive statistics

Standard deviation1198.360557
Coefficient of variation (CV)4.467485565
Kurtosis2772.238185
Mean268.240499
Median Absolute Deviation (MAD)96.5
Skewness48.56980066
Sum1529775.566
Variance1436068.025
MonotonicityNot monotonic
2022-11-23T19:18:46.735380image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1114
 
2.0%
272
 
1.3%
351
 
0.9%
449
 
0.9%
535
 
0.6%
629
 
0.5%
1226
 
0.5%
10022
 
0.4%
7222
 
0.4%
7321
 
0.4%
Other values (2360)5262
92.3%
ValueCountFrequency (%)
1114
2.0%
272
1.3%
351
0.9%
3.3333333331
 
< 0.1%
449
0.9%
535
 
0.6%
5.3333333331
 
< 0.1%
5.6666666671
 
< 0.1%
629
 
0.5%
6.1428571431
 
< 0.1%
ValueCountFrequency (%)
742151
< 0.1%
40498.51
< 0.1%
141491
< 0.1%
139561
< 0.1%
78241
< 0.1%
6009.3333331
< 0.1%
59631
< 0.1%
51971
< 0.1%
43001
< 0.1%
42821
< 0.1%

avg_unique_products
Real number (ℝ≥0)

HIGH CORRELATION

Distinct1171
Distinct (%)20.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean37.25704339
Minimum0.2
Maximum1109
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size89.1 KiB
2022-11-23T19:18:46.968530image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0.2
5-th percentile1
Q17.267857143
median15.07692308
Q331
95-th percentile173
Maximum1109
Range1108.8
Interquartile range (IQR)23.73214286

Descriptive statistics

Standard deviation76.83231525
Coefficient of variation (CV)2.062222556
Kurtosis32.93259067
Mean37.25704339
Median Absolute Deviation (MAD)9.923076923
Skewness5.076421351
Sum212476.9185
Variance5903.204666
MonotonicityNot monotonic
2022-11-23T19:18:47.189817image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1277
 
4.9%
2161
 
2.8%
3115
 
2.0%
9105
 
1.8%
10105
 
1.8%
8103
 
1.8%
7101
 
1.8%
6101
 
1.8%
5100
 
1.8%
1397
 
1.7%
Other values (1161)4438
77.8%
ValueCountFrequency (%)
0.21
 
< 0.1%
0.253
 
0.1%
0.33333333337
0.1%
0.41
 
< 0.1%
0.40909090911
 
< 0.1%
0.512
0.2%
0.54545454551
 
< 0.1%
0.55555555561
 
< 0.1%
0.57142857141
 
< 0.1%
0.61764705881
 
< 0.1%
ValueCountFrequency (%)
11091
< 0.1%
7481
< 0.1%
7301
< 0.1%
7201
< 0.1%
7031
< 0.1%
6861
< 0.1%
6751
< 0.1%
6731
< 0.1%
6601
< 0.1%
6491
< 0.1%

recency_days
Real number (ℝ≥0)

HIGH CORRELATION

Distinct304
Distinct (%)5.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean116.8099246
Minimum0
Maximum373
Zeros38
Zeros (%)0.7%
Negative0
Negative (%)0.0%
Memory size89.1 KiB
2022-11-23T19:18:47.414134image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile3
Q123
median71
Q3199
95-th percentile338
Maximum373
Range373
Interquartile range (IQR)176

Descriptive statistics

Standard deviation111.5190071
Coefficient of variation (CV)0.9547048975
Kurtosis-0.637244736
Mean116.8099246
Median Absolute Deviation (MAD)61
Skewness0.8158475767
Sum666167
Variance12436.48894
MonotonicityNot monotonic
2022-11-23T19:18:47.597416image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1110
 
1.9%
4105
 
1.8%
398
 
1.7%
292
 
1.6%
1086
 
1.5%
882
 
1.4%
1779
 
1.4%
979
 
1.4%
778
 
1.4%
1567
 
1.2%
Other values (294)4827
84.6%
ValueCountFrequency (%)
038
 
0.7%
1110
1.9%
292
1.6%
398
1.7%
4105
1.8%
552
0.9%
778
1.4%
882
1.4%
979
1.4%
1086
1.5%
ValueCountFrequency (%)
37323
0.4%
37222
0.4%
37117
0.3%
3694
 
0.1%
36813
0.2%
36716
0.3%
36615
0.3%
36519
0.3%
36411
0.2%
3627
 
0.1%

purchase_frequency
Real number (ℝ≥0)

HIGH CORRELATION

Distinct1226
Distinct (%)21.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.5474289192
Minimum0.005449591281
Maximum17
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size89.1 KiB
2022-11-23T19:18:47.745400image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0.005449591281
5-th percentile0.01103144296
Q10.02496537396
median1
Q31
95-th percentile1
Maximum17
Range16.99455041
Interquartile range (IQR)0.975034626

Descriptive statistics

Standard deviation0.5504743093
Coefficient of variation (CV)1.005563079
Kurtosis138.7183578
Mean0.5474289192
Median Absolute Deviation (MAD)0
Skewness4.848314544
Sum3121.987126
Variance0.3030219651
MonotonicityNot monotonic
2022-11-23T19:18:47.879611image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
12882
50.5%
248
 
0.8%
0.062518
 
0.3%
0.0277777777817
 
0.3%
0.0238095238116
 
0.3%
0.0833333333315
 
0.3%
0.0344827586215
 
0.3%
0.0909090909115
 
0.3%
0.0294117647114
 
0.2%
0.0357142857113
 
0.2%
Other values (1216)2650
46.5%
ValueCountFrequency (%)
0.0054495912811
 
< 0.1%
0.0054644808741
 
< 0.1%
0.0054794520551
 
< 0.1%
0.0054945054951
 
< 0.1%
0.0055865921792
< 0.1%
0.0056022408961
 
< 0.1%
0.0056179775282
< 0.1%
0.005665722381
 
< 0.1%
0.0056818181822
< 0.1%
0.0056980056983
0.1%
ValueCountFrequency (%)
171
 
< 0.1%
41
 
< 0.1%
35
 
0.1%
248
 
0.8%
1.1428571431
 
< 0.1%
12882
50.5%
0.751
 
< 0.1%
0.66666666673
 
0.1%
0.5508021391
 
< 0.1%
0.53351206431
 
< 0.1%

qtd_cancelled_products
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS

Distinct213
Distinct (%)3.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean45.19480975
Minimum0
Maximum80995
Zeros4200
Zeros (%)73.6%
Negative0
Negative (%)0.0%
Memory size89.1 KiB
2022-11-23T19:18:48.073055image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile38
Maximum80995
Range80995
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1468.516176
Coefficient of variation (CV)32.49302705
Kurtosis2760.820132
Mean45.19480975
Median Absolute Deviation (MAD)0
Skewness52.07018364
Sum257746
Variance2156539.759
MonotonicityNot monotonic
2022-11-23T19:18:48.239314image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
04200
73.6%
1169
 
3.0%
2151
 
2.6%
3105
 
1.8%
489
 
1.6%
678
 
1.4%
561
 
1.1%
1252
 
0.9%
744
 
0.8%
843
 
0.8%
Other values (203)711
 
12.5%
ValueCountFrequency (%)
04200
73.6%
1169
 
3.0%
2151
 
2.6%
3105
 
1.8%
489
 
1.6%
561
 
1.1%
678
 
1.4%
744
 
0.8%
843
 
0.8%
941
 
0.7%
ValueCountFrequency (%)
809951
< 0.1%
742151
< 0.1%
90141
< 0.1%
80041
< 0.1%
44271
< 0.1%
37681
< 0.1%
33321
< 0.1%
28781
< 0.1%
20221
< 0.1%
20121
< 0.1%

cancelled_over_purchased
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct1380
Distinct (%)24.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.012446537
Minimum0
Maximum1
Zeros4200
Zeros (%)73.6%
Negative0
Negative (%)0.0%
Memory size89.1 KiB
2022-11-23T19:18:48.409284image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30.001084013706
95-th percentile0.04119629874
Maximum1
Range1
Interquartile range (IQR)0.001084013706

Descriptive statistics

Standard deviation0.06760595652
Coefficient of variation (CV)5.431708154
Kurtosis128.7915174
Mean0.012446537
Median Absolute Deviation (MAD)0
Skewness10.38988063
Sum70.98260053
Variance0.004570565357
MonotonicityNot monotonic
2022-11-23T19:18:48.553518image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
04200
73.6%
112
 
0.2%
0.010752688174
 
0.1%
0.0096618357494
 
0.1%
0.0092592592593
 
0.1%
0.53
 
0.1%
0.038461538463
 
0.1%
0.02439024393
 
0.1%
0.023809523813
 
0.1%
0.0074626865673
 
0.1%
Other values (1370)1465
 
25.7%
ValueCountFrequency (%)
04200
73.6%
0.00011696362431
 
< 0.1%
0.00018399264031
 
< 0.1%
0.00028169014081
 
< 0.1%
0.00031407035181
 
< 0.1%
0.00036192544341
 
< 0.1%
0.00036324010171
 
< 0.1%
0.00036376864311
 
< 0.1%
0.00036710719531
 
< 0.1%
0.0003930817611
 
< 0.1%
ValueCountFrequency (%)
112
0.2%
0.99997530771
 
< 0.1%
0.98630136991
 
< 0.1%
0.83333333331
 
< 0.1%
0.63333333331
 
< 0.1%
0.61151079141
 
< 0.1%
0.60088365241
 
< 0.1%
0.59645669291
 
< 0.1%
0.56488549621
 
< 0.1%
0.56463878331
 
< 0.1%

Interactions

2022-11-23T19:18:41.446781image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:23.655699image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:25.296391image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:26.886649image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:28.347606image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:30.761296image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:32.883838image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:34.866440image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:36.279382image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:37.756592image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:39.452507image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:41.618806image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:23.923807image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:25.422509image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:27.016554image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:28.489552image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:30.953027image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:33.029056image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:34.989405image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:36.424679image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:37.887441image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:39.607031image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:41.746612image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:24.065728image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:25.551516image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:27.151672image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:28.609080image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:31.148662image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:33.181569image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:35.111369image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:36.564452image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:38.019594image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:39.819741image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:41.885470image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:24.210460image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:25.671847image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:27.274795image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:28.733393image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:31.338308image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:33.564476image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:35.235628image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:36.693959image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:38.146853image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:39.988527image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:42.004527image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:24.336463image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:25.790945image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:27.393373image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:28.881482image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:31.508450image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:33.693460image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:35.358808image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:36.814426image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:38.274020image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:40.159111image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:42.154771image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:24.501671image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:25.927333image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:27.530535image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:29.160592image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:31.710494image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:33.888806image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:35.495454image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:36.954475image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:38.421046image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:40.458933image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:42.342672image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:24.646498image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:26.064647image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:27.662589image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:29.364522image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:31.911556image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:34.125443image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:35.628371image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:37.090473image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:38.563612image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:40.615981image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:42.539520image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:24.771345image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:26.192409image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:27.782224image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:29.571594image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:32.113898image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:34.302732image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:35.755544image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:37.218385image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:38.691430image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:40.750615image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:42.801505image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:24.900165image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:26.493540image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:27.906065image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:29.908924image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:32.311665image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:34.443850image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:35.883465image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:37.353515image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:39.019572image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:40.895438image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:42.942533image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:25.032676image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:26.623481image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:28.041397image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:30.366129image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:32.474305image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:34.585433image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:36.012580image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:37.489763image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:39.163709image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:41.076360image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:43.131524image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:25.170388image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:26.756781image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:28.204260image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:30.601666image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:32.674506image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:34.729692image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:36.153719image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:37.628558image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:39.314687image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-23T19:18:41.267487image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Correlations

2022-11-23T19:18:48.677073image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Auto

The auto setting is an easily interpretable pairwise column metric of the following mapping: vartype-vartype : method, categorical-categorical : Cramer's V, numerical-categorical : Cramer's V (using a discretized numerical column), numerical-numerical : Spearman's ρ. This configuration uses the best suitable for each pair of columns.
2022-11-23T19:18:48.884508image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-11-23T19:18:49.118229image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-11-23T19:18:49.391894image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-11-23T19:18:49.609213image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-11-23T19:18:43.433339image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
A simple visualization of nullity by column.
2022-11-23T19:18:43.715396image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

customer_idgross_revenuemean_ticketqtd_purchasesqtd_productsavg_productsavg_unique_productsrecency_dayspurchase_frequencyqtd_cancelled_productscancelled_over_purchased
0178505288.6316.9534173350.970.6237217.00400.02
1130473089.1015.9291390154.4411.67560.03350.03
2125836629.3428.21155028335.207.6020.04500.01
313748948.2533.87543987.804.80950.0200.00
415100635.10105.8538026.670.333330.07220.28
5152914551.5142.14142102150.144.36250.04290.01
6146885107.3814.23213621172.437.0570.063990.11
7178095344.8584.84122057171.423.83160.03410.02
81531159419.3423.859138194419.716.2300.244740.01
9160982005.6329.93761387.574.86870.0200.00

Last rows

customer_idgross_revenuemean_ticketqtd_purchasesqtd_productsavg_productsavg_unique_productsrecency_dayspurchase_frequencyqtd_cancelled_productscancelled_over_purchased
5693227004839.4278.06110741074.0055.0011.0000.00
569413298360.00180.0019696.002.0011.0000.00
569514569227.3918.9517979.0010.0011.0000.00
56962270417.902.5611414.007.0011.0000.00
5697227053.351.68122.002.0011.0000.00
5698227065699.008.99117471747.00634.0011.0000.00
5699227076756.069.25120102010.00730.0001.0000.00
5700227083217.2054.531654654.0056.0001.0000.00
5701227093950.7218.211731731.00217.0001.0000.00
570212713794.5521.471505505.0037.0001.0000.00